Estimating a bivariate linear relationship* 

David Leonard^ 



Abstract 

Solutions of the bivariate, linear errors-in-variables estimation problem with un- 
specified errors are expected to be invariant under interchange and scaling of the 
coordinates. The appealing model of normally distributed true values and errors is 
unidentified without additional information. I propose a prior density that incorpo- 
rates the fact that the slope and variance parameters together determine the covariance 
matrix of the unobserved true values but is otherwise diffuse. The marginal posterior 
density of the slope is invariant to interchange and scaling of the coordinates and de- 
pends on the data only through the sample correlation coefficient and ratio of standard 
deviations. It covers the interval between the two ordinary least squares estimates but 
diminishes rapidly outside of it. I introduce the R package leiv for computing the 
posterior density, and I apply it to examples in astronomy and method comparison. 
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1 Introduction 



Simple linear relationships inspire much empirical research, yet how to estimate their pa- 
rameters is a topic of continuing debate. Longstanding examples include the permanent 



income model in economics (Zellner 1971 ), cosmic distance scale applications in astron- 
omy (|Isobe et aL||1990[), and allometric studies in biology ([Warton et al.[|2006l). From a 



statistical perspective, the common goal of these investigations is to estimate the slope re- 
lating two variables that are observed with error. The controversy stems from the absence 
of an estimate that is invariant to interchange and scaling of the coordinates and depends 
reasonably on their joint distribution. 

In one of the earliest comprehensive reviews, [Madansky] ( 1959 1 fixes ideas with the 
familiar problem of estimating the density of a solid by fitting a line to measurements of 
the mass and volume of a number of specimens. In this problem, the density estimate 
should not depend on which axes the variables are plotted. It should also not depend on 
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the units of measurement. That is, the same inference should be made by applying a scale 
conversion to the data before fitting or to the density estimate afterward. The observations 
may be affected by measurement errors as well as errors intrinsic to the specimens, such 
as contamination by unknown impurities. 

In many applications, the linear relationship appears on the log-log scale, hi such in- 
stances, units of measurement do not affect the slope of the fitted line, so it may seem that 
scale invariance is unnecessary. |Warton et al.| ( |2006| ) point out, however, that many mul- 
tiplicative relationships involve arbitrary powers of the variables that translate to scale 
changes upon log transformation. They offer that in an allometric analysis of certain 
saplings, for example, analyzing the relationship between height and basal diameter or 
basal area should lead to the same scientific conclusions. Similar considerations apply in 



the analysis of the Faber- Jackson relation. Section 4.2 



The ordinary least-squares (OLS) estimate is scale invariant but not invariant to in 
terchange of the coordinates. The orthogonal regression estimate, proposed by [Adcock 



( |1877[p^78| ) and IPearson] ( | 1 90 1 1 ) , is invariant to interchange of the coordinates, but it was 



famously criticized by Wald ( 1940 1 for its lack of scale invariance. The economist Samuel- 



son 



(1942]) proposed the additional property of dependence only on the sample correlation 
coefficient and ratio of standard deviations. He showed that the only point estimate of 
the slope exhibiting these particular invariance and dependence properties is the geometric 
mean of the two OLS estimates, an estimate he credited to |Frisch| ( |1934[ ). This estimate, 
which is equal to the ratio of standard deviations of the measurements, depends on their 
joint distribution only for its sign, however. 

Dependence on the correlation coefficient and ratio of standard deviations is especially 
appealing in the model of normally distributed true values contaminated by normally dis- 
tributed errors. Reiers0l| ( 1950, ) demonstrated that this model is unidentified; the sampling 
density identifies not a point but a continuum of estimates. In some situations, such as in 
pure measurement error problems, supplementing the data with replicate measurements 
may solve the identification problem. In others, the additional information must come 
from outside the sample. A prior density would provide a natural way to incorporate it. 

This article will show how to assign a prior density jointly to the slope and variance 
parameters that leads to a marginal posterior density of the slope that is invariant under in- 
terchange and scaling of the coordinates and has sufficient statistics in the sample correla- 
tion coefficient and ratio of standard deviations. Passage to an appropriate noninformative 
limit is possible at the very end of the calculation. In contrast, previous Bayesian solutions 
have relied on independent, informative prior densities for the variance parameters (see, 
for example, |Zellner[ |1971[ [Polasek and Krause} |1993[ ). 

In one of the earliest reported Bayesian analyses, Lindley and El-Sayyad| ( |1968| ) pre- 
dicted some general properties of the marginal posterior density without fully specifying 
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the prior density. These properties, notably including failure to concentrate around a sin- 
gle value in the limit of infinitely large samples, are indeed exhibited by the fully specified 
solution that follows. 

This introduction has mentioned only a few relevant developments in the long history 
of this problem. [Madansky | ( fT959l ) , | Anders"on| ( | 1 9 84ap , [Sprentj ( [^90] ) and StefansEl ( |2000l ) 
provide comprehensive reviews. The book by Fuller ( |1987 ) has become a standard refer- 
Isobe et aL] ( |1990| ) describe applications in astronomy, biology, chemistry, geology 



ence. 



and physics. 



2 Formulation of the problem 

I adopt the notation of Zellner's ( |197 1i Chapter 5) comprehensive presentation. The data 
are n pairs {yu^yn}, viewed as independent, noisy observations of their unobserved true 
values {<^h-,<^2/} 

y\i = ^\i + uu, 

i = 1, . . . ,n. The elements of each pair of true values are linearly related, 

& = a + /3^n, (2) 

and I seek an estimate of the slope j8 and possibly the intercept a. 

I consider the model in which the true values {^u} and {^2i} are samples from a bi- 
variate normal distribution, degenerate to the regression line, and the total errors and 
{u2i} are independently and normally distributed with mean zero and respective variances 
of and a|. Despite outward appearances, assuming that either {^u} or {^2i} are samples 
from an improper constant density imposes severe restrictions on the resulting solutions. 



Zellner ( 1971 1 showed that the model in which {^u} are samples from an improper con- 
stant density produces the same estimates of /3 and a as OLS regression of y2 on yi. He 
explained that the infinite variance of the distribution of {^n} makes the variance of the 
distribution of errors {uu} negligible in comparison. Likewise, the model in which {<^2/} 
are samples from an improper constant density produces the same estimates as OLS re- 
gression of yi on y2. Indeed, any model that assumes an improper constant density along 
a line in the plane of true values presupposes a direction of ignorable error. The normal 
model may be appropriate in case such specific information is not available. 

Denoting the distribution of {^u} by N(/ii, T^), the sampling distribution of the obser- 
vations {yu^yii} is bivariate normal with mean 

ju = (jUi,a + /3jUi) (3) 
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and covariance 



(4) 



Reiers0l ( 1950 1 demonstrated the consequence of the fact that the sampling distribution 
has six unknown parameters, but the sample mean and covariance matrix provide only 
five sufficient statistics. Put simply, /3 is not identifiable in the normal model without 
additional information. OLS regression overcomes the difficulty by assuming one of C7i 
or 02 is zero, while orthogonal regression assumes the ratio O2/01 = I or, more generally, 
a known constant. These assumptions are more than adequate to estimate /3; reducing 
the number of unknown parameters by one provides estimates of the remaining five. The 
focus of the present effort is to find out how much the form of E can tell us about /3 alone. 

3 The posterior probability density 

Ultimately, I will estimate j8 from the posterior density 

where J is the nx2 observation matrix (3^1,^2)5 PiP) is a prior density defined later in this 
section, and 



Piy\^) = j ■■■ j Piy \ Ail,a,/3,T^al^a2 



,2^ 

(6) 



X p{iii,a,T^,of,02 I /3)djUidadT^daf da| 

is the reduced sampling density. 

As discussed in Section |2[ the full sampling density in the integrand of ([6]) is bivariate 
normal 

= (2;r)-"/2|Er"/2exp{-itr[(n(3;-Ai)"'(y-M) + v5)E-i]}, 

where y= (y 1 , 3^2) is the vector of sample means, v = n — l, and S is the sample covariance 
matrix with divisor n — I. 

I factor the conditional prior density of the location and variance parameters in (|6]) as 

p{Hi,a,x^,af,ai \ I5)=p{^i,a\ T^,af,a^,[5)p{T^,af,a^ \ /3) (8) 

and take the conditional prior density of the location parameters jii and a to be a constant. 
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Previous Bayesian analyses have gone forward under the assumption that various func- 
tions of the variance parameters, for example, the ratio 02/0^, are approximately known 
(see, for example, [Zellner 1971 Section 5.4; Polasek and Krause, 1993). In the absence 
of such knowledge, the fact that E links the variance parameters to the slope should not be 
ignored. In particular, from Q, nonnegativity of the variance parameters 



t2 = 1ei2 > 0, 



/3 

a?=En-^Ei2>0, (9) 

(7|-E22-/3Ei2>0, 

modifies the domain of E given /3, and this information can be incorporated by assigning a 
conditional prior density to E given /3 after changing variables />(t^, , a| | /3) = |/3 |p(E | 
/3). 

Assigning an inverted Wishart density to /»(E | jS) acknowledges that E generates the 
Wishart distributed sample covariance S ( Anderson] |1984b| , Chapter 7). However, because 
the domain of E depends on /3 , the inverted Wishart form 

/;(E|/3)=i^(/3,Vo,^o(/3))-i|Er(^«+3)/2exp[-itr(vi'o(i8)E-i)] (10) 

features a normalization factor that is a function of /3. It also introduces a degrees of 
freedom parameter Vq, a correlation parameter po, and a scale parameter Kq with units of 
y\ through the precision matrix 

W) = v„.j(p;^ f). (11) 

Fortunately, it will be possible to take the limit Vq — at the very end of the calcula- 
tion, removing any information these parameters carry, for any — 1 < po < 1 and Kq > 0. 
The normalization factor ^(j8, Vo,^o(j8)) is the crux of the method; it is calculated in 
Appendix 1. 

After using (|7|)-([T0l) to carry out the integrations in ([6]), the posterior density ([5]) is 

„(R\,.)- p(/3m/3,vo + v,^o(/3) + v5)/i^(/3,vo,^o(/3)) 

^^'^'^^ r^p(/3)i^(/3,vo + v,^o(/3)+v5)/if(i8,Vo,^o(/3))d/3- ^ 

Appendix 2 shows that in the limit Vq — 0, 
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where the sample correlation coefficient r = Su/ {811822)^^'^ and the ratio of standard 
deviations / = (5'22/'S'ii)^/^ are sufficient statistics, and 

7(i8,V,r,Z)=/(|/3|//,v,rsign(/3))+/(Z/|/3|,v,rsign(j8)). (14) 



In ([14]), 

/(i8,v,r)= / Mt;v)Pf{F{t,l5,v,ry,v + hv-l)dt, (15) 

where pt{t; v) is the Student t probability density function, Pf{F; Vi, V2) is the F cumula- 
tive distribution function. 



t_{v,r) = -^/vr/V'^-r^, (16) 



r+(j8,V,r) = v^(/3-r)/Vl-r2, (17) 

and 

V — 1 V -\-t^ 

F{t,l5,v,r) = ^-— . —— -. (18) 

V + 1 [r+(/3,v,r)-f_(v,r)]2-[^-r_(v,r)]2 



Importantly, ( [T4] ) shows that J{[5,v,r,l) = 7(/3//,v,r, 1), so the entire role of j8 is 



mediated by the scale invariant parameter j8 = /3//. In terms of /3 the posterior density 



([13]) is 

pil5\y)=piP\y)/l, (19) 

where 

p0ly)= "(/^M-^---" , ,20) 
r„M|3W,v,r,l)dj3 

depends on the data only through the sample correlation coefficient. 

It remains to assign the prior density p(j8). The pure number /3 ensures scale invari- 
ance. At a minimum, the prior specification should be invariant under interchange of the 
coordinates. The sampling density ([7]), however, is invariant under continuous rotations 
of the coordinate plane, and for now I assume that the prior information available on /3 
is indifferent to such rotations as well. I will return to this point briefly in Section |5] A 
rotationally invariant prior density will necessarily be invariant under interchange of the 
coordinates, by rotation through the angle n/2. 

Under rotation of the coordinates through an angle cp, a rotationally invariant prior 
density must satisfy the functional equation p(/3) = p(/3') |d/3'/d/3|, where j8' = (/3 cos (p — 
sin (p) I (cos 9 + j8 sin 9). Conveniently, there is only one solution, the Cauchy density 
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equivalent to a uniform density on the angle 9 = arctan j8 . Appendix 3 shows that the 
resulting posterior density ([T9|)-(|2T|) has precisely the form required for a density that 
depends on the data only through the sample correlation coefficient and ratio of standard 
deviations to be invariant under interchange and scaling of the coordinates. 

The function 7(j8 , v, r, Z) in ( [T4| ) is a sum of two integrals, one over the sampling density 
of the estimate rl of j8 in the OLS regression of y2 on yi with v degrees of freedom, and the 
other over the sampling density of the estimate r// of 1 //3 in the OLS regression of yi on y2 
with V degrees of freedom. These integrals are well-defined for n > 2. As n becomes large, 
the Student t density in the integrand of I{P /I, v, r) becomes more sharply peaked around 
t = 0, while the F cumulative distribution function becomes more like a unit step function 
at F(?,/3//, V, r) = 1. Consequently, this integral contributes little to 7(j8, V,r,Z) unless 
the point f = is in : F(?,j8//,v,r) > 1}. From definitions ([T6l)-([T8]), this condition is 
met whenever |/3 1 < /. By the same reasoning, the integral /(Z/j8, v, r) contributes little to 
7(j8, V, r, /) unless |j8 1 > /. In other words, for |/3 1 < /, the posterior density is based largely 
on the sampling density of the estimate rl of /3 in the OLS regression of y2 on yi, whereas 
for 1/3 1 > /, it is based largely on the sampling density of the estimate r/Z of l/j8 in the 
OLS regression of on 3;2- 

In special cases, the integrals in 7(/3, v, r, /) can be evaluated analytically. For instance. 
Appendix 4 provides closed form expressions for the posterior density ([19]) for sample 
sizes of « = 4 and n = 6. For n = 4, the posterior density of = arctan /3 is proportional 
to |sin20|/(l — rsin20), —n/l < < 7r/2. For n = 6, the posterior density of is pro- 
portional to I sin 20 1 (2 — r sin 20) /(I — rsin20)^. These densities have relative maxima at 
6 = ±71 / 4 and absolute maximum at = sign(r) 7r/4. 

More generally, the posterior density of the scale invariant slope ( pO] ) is illustrated in 
FigureJTJfor sample sizes of n = 10 and n = 100. Notable features include the symmetry 
about /3 = for r = and the concentration about j8 = ±lasr— T-il. For \r\ < 1 , however, 
the width does not shrink to zero as n — t- oo. Figure [T] also shows the posterior density of 
the corresponding angle 6 = arctan j8, for which the prior density ( [2T] ) is uniform. The R 
package leiv ( |R Development Co re Tearn 2011, Leonard[ |2011[ ) computes the posterior 
density ([T9l)-(pT]) and is freely available from the Comprehensive R Archive Network 
(CRAN). 



4 Examples and simulations 



4.1 Zellner's artificial data 



The present example, using the artificial data of |Zellner| ( |1971[ Table 5.1), compares the 
posterior density (T9\ to Zellner's informed solution. The data are n = 20 pairs {yu,y2i}. 
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Figure 1 : Posterior densities of the scale invariant slope /3 (left) computed from ([20l)-([2T|) 
and of the corresponding angle 9 = arctan /3 (right) for sample sizes ofn = 10 (upper) and 
n = 100 (lower) and a series of sample correlation coefficients. 



generated from the model Q and (|2]), with slope /3 = 1, intercept a = 2, error variances 
of = 4 and a| = 1, true means /ii = 5 and /i2 = a + /3/ii = 7, and true variance T^=16. 
These data meet all the assumptions of Section[2} The sufficient statistics are r = 0-909 and 
/ = 0-963. The posterior density ([19]) is plotted in Figure |2j The posterior median is 0-963; 
the shortest 95% probability interval is (0-722, 1-237). For comparison, the 95% confi- 
dence intervals are (0-676, 1-075) from the OLS regression of y2 on yi and (0-864, 1-372) 
from the OLS regression of yi on 3^2 • 
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Figure 2: Posterior density of the slope /3 (solid) calculated from ([T9l)-([2T]) and (dashed) 



calculated by Zellner ( 1971[ Figure 5.4) using an informative prior density. 



Figure [2] also illustrates the posterior density that Zellner ( 19711 Figure 5.4) calcu- 
lated from the same data, assuming a uniform prior density for j8 and an independent, 
inverted gamma prior density for the true error variance ratio with mean 0-246 and stan- 
dard deviation 0-152. Zellner's posterior density is slightly narrower, due primarily to the 
informative prior density on the variance ratio. It is shifted somewhat to the right, due in 
part to the uniform prior density for /3, which is not rotationally invariant and favors angles 
approaching ±n/2. 
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4.2 Faber- Jackson relation 

The following example illustrates the dilemma posed by estimates that do not possess the 
same symmetries as the problem statement. The data are the luminosities L and velocity 



dispersions a of n = 40 elliptical galaxies obtained from Schechter's ( 1980 ) measurements 
of the Faber- Jackson relation, L ~ a^, as presented by Isobe et al. ( 1990', Section 4). As 



these authors explain, theoretical predictions of /3 range from 2 to 3 to 4. Figure |3] is a 
log-log plot of L versus o. The figure strongly suggests a linear relationship, although 
there is considerable scatter, due primarily to uncharacterized intrinsic processes. 

The popular OLS bisector and orthogonal regression estimates of /3 used by astrono- 
mers in this and other cosmic distance scale applications are invariant under interchange of 
the coordinates. The OLS bisector line bisects the angle between the two OLS regression 
lines, also shown in Figure [3} with slopes b\ = S^/Sw = 2-4 (0-4) and b2 = S22/S12 = 
5-4 (1-0). It has slope ^olsb(^i,^2) = tan(i(arctanZ7i arctanZ72)) = 3-4 (0-4). The 
orthogonal regression line minimizes the sum of the squared perpendicular distances to the 
data. It has slope Z7or(^i,^2) = 5 + sign(5i2)(52 + 1)^/^ = 5-2 (1-0), where 5 = 2(^2- 
l/bi). Here and in the following the standard errors are estimated from 10"^ bootstrap 
replicates. 

The OLS bisector and orthogonal regression estimates of /3 are not invariant under 
scaling of the coordinates. In theoretical developments of the Faber- Jackson relation, the 
velocity dispersion generally enters raised to the second power via the kinetic energy, and 
in general one would expect the analysis of L ~ (c7^)^/^ to lead to the same estimate of 
/3 for any k > 0. This translates under log transformation to scale invariance. Defining 

^olsB/k = boLSBibi/kM/k), the limits Z^S^gg = l/ib^' +b^')= 3-3 (0-4) and b^^l^^ = 

{bi+b2)/2 = 3-9 (0-5). Similarly, defining b'^^l/k = bo^{bi/kM/k), the limits b^^^ = 

b2 = 5-4 (1-0) and =bi= 2-4 (0-4). 



With no criterion of choice, investigators have no firm interval estimate of j8. [Isobe 



et al.| ( [T990| ) state, "In cases like these, the astronomer would be wise to calculate [a num- 
ber of] regressions and be appropriately cautious regarding the confidence of the inferred 
conclusion." The strategy introduced in Section [3] inspires a more positive state of affairs. 
Figure |4] illustrates the interchange and scale invariant posterior density ([19]). The poste- 
rior median 3-6 favors the theoretical predictions /3 = 3 and j8 = 4 more than /3 = 2, but 
the shortest 95% probability interval (1-8, 6-1) confirms that more evidence is needed. 



4.3 Coverage probability 



Analysis of the Faber-Jackson data of Section |4.2| showed that the location and width of 
interval estimates around two interchange invariant point estimates of /3 vary with an arbi- 
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log a (km/s) 



Figure 3: Luminosities and velocity dispersions obtained from Schechter's (1980) mea- 
surements of the Faber- Jackson relation with the regression line based on the median of 
([19]) (bold) and the two ordinary least squares regression lines (dotted). 



trary choice of scale. Another limitation is that, because the sampling density identifies E 
but not /3, confidence intervals for point estimates b{S), all functions of sufficient statistics 
for E, reflect sampling variation about the population values b{T.) but not /3. The slope 
cannot be identified with the population value b{L) unless further restrictions apply. For 
instance, the identifying condition for the scale and interchange invariant geometric mean 
model mentioned in Section [T] is crl/crf = E22/E11, and the identifying condition for the 
orthogonal regression model is C7^ = C7|. Coverage of confidence intervals around these 
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d 




Figure 4: Posterior density of j8 in the Faber- Jackson relation with the median and 

shortest 95% probability interval. 



point estimates is expected to reach nominal levels when the identifying conditions hold, 
but otherwise how they will perform is uncertain. 

On the other hand, shortest posterior probability intervals calculated from ([19]) are 
marginalized over a distribution of variance parameters. Considering that the sampling 
density cannot simultaneously identify the slope and the variance parameters, how the 
posterior probability intervals will perform when applied to data characterized by a specific 
configuration of variance parameters is also uncertain. In this section, I use numerical 
simulation to study the empirical coverage probability of these intervals. 
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Table 1: Empirical coverage (%) of the shortest 90% posterior probability interval and 
nominal 90% confidence intervals of point estimates of /3 . 









Posterior 


Geometric 


OLS 


Orthogonal 


n 






density 


mean 


bisector 


regression 




0-05, 


1-00 


86-5 


52-5 


49-8 


77-3 




0-10, 


0-50 


89-9 


79-1 


78-2 


81-1 


20 


0-20, 


0-20 


92-8 


84-8 


84-6 


85-3 




0-50, 


0-10 


82-9 


64-0 


63-5 


651 




1-00, 


0-05 


72-2 


21-3 


24-4 


20-3 




0-05, 


1-00 


80-7 


7-8 


7-2 


22-4 




0-10, 


0-50 


83-9 


56-3 


55-8 


58-9 


50 


0-20, 


0-20 


94-6 


88-0 


88-0 


88-3 




0-50, 


0-10 


75-8 


40-4 


40-4 


410 




1-00, 


0-05 


54-4 


3-2 


3-4 


2-8 




0-05, 


1-00 


71-6 


0-3 


0-2 


0-9 




0-10, 


0-50 


75-5 


30-2 


30-1 


30-9 


100 


0-20, 


0-20 


96-6 


88-0 


88-0 


88-1 




0-50, 


0-10 


71-5 


23-3 


23-2 


22-8 




1-00, 


0-05 


42-1 


0-1 


0-1 


0-0 



Coverage based on 1000 random data sets. Geometric mean, OLS bisector, 
and orthogonal regression basic bootstrap confidence intervals estimated 
using 999 bootstrap replicates of each data set. Posterior density from ( fT9l ). 
Geometric mean estimate: sign(5i2)\/^i^2, b\ — Sn/Sw, b2 — fe/^n- 
OLS bisector estimate: tan(5(0i + 62)), 0i = arctan/ji, 02 = arctan/72- 
Orthogonal regression estimate: B + sign (512 )\/-B^ + L B = ^{b2 — l/bi). 
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I considered a number of sample size and measurement error settings, as shown in 
Table [T] For each setting, I generated 1000 data sets by randomly drawing the true values 
~ N(0, 1) and the errors {uu} ~ N(0, af) and {u2i} ~ N(0,a|), for z = 1, . . . I 
then constructed the observations {yu} and {^2!} from the model ([T]) and ([2]), with slope 
j8 = 1 and intercept a = 0. I calculated the shortest 90% posterior probability interval for 
j8 using ( [191 ), ^iid 90% basic bootstrap confidence intervals using 999 replicates of each 
data set for the geometric mean, OLS bisector, and orthogonal regression estimates (see, 
for example, Davison and Hinkley[ 1997j , Chapter 5). Table [T] shows the percentage of 



intervals that contained j8 = 1 . 

Table [T] shows that coverage of confidence intervals around the popular interchange- 
invariant point estimates of /3 reached the nominal level when 0\ = 02, the settings in 
which, for /3 = 1 and = 1, the identifying conditions held. Otherwise, coverage fell 
well short of the nominal level and worsened in larger samples. 

The posterior probability intervals exhibited broader coverage accuracy, overcovering 
in the vicinity of a^/c^ = and undercovering in more singular regions of the {01,02) 
sampling domain. Further investigation confirmed that the average coverage of the pos- 
terior probability interval over the limiting sampling density p{of,02 | /3,T^) o= l^l^''^ 
matched the nominal level. 

Previous studies have reported much better performance of the geometric mean, OLS 
bisector, and orthogonal regression estimates O Babu and Feigelson] , |1992[ Tables 2 and 
3; | Warton et al.[ [2006| Table 8). In these studies, however, replicate data was generated 



from known E, not j8 . Correspondingly, performance was measured relative to the iden- 
tified value b(L), not /3. Evaluated this way, the estimates perform well in general and 
increasingly well in larger samples, in direct contrast to the present findings. 



4.4 Method comparison 



The final example shows how the posterior density estimate (19) addresses a limitation 



of the Bland-Altman approach to method comparison studies ( |Altman and Bland] [1983 



Bland and Altman |1986 ). The data are from a study comparing two methods of estimating 
the fat content of 45 samples of human milk ( [Bland and Altman , 1999[ Table 3). One 
method (yi) is the standard Gerber method; the other (y2) relies on enzymic hydrolysis of 
triglycerides. Figure [5[ shows a scatter plot of the data. 

Standard practice in method comparison relies on the approach described in the highly 
influential publications of Altman and Bland ( 1983 ); [Bland and Altman ( 1986| ). The cen- 
terpiece of the method is the Bland-Altman plot, a plot of the differences against the means 
of the two methods. Figure [6l shows a Bland-Altman plot of the fat content data, with hor- 
izontal limits of agreement 1.96 standard deviations above and below the mean difference. 
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Figure 5: Fat content of human milk determined by the standard Gerber method and by 
enzymic hydrolysis of triglycerides with the regression line based on the median of ( [191 ) 
(bold) and the line of equality (dotted). 



The Bland- Altman plot, serving as a type of residuals plot for the identity model, provides 
a helpful perspective on the data. In this case, it gives no indication of overall bias, but it 
suggests a decreasing trend for the differences relative to the means. 

The Bland-Altman approach sidesteps the controversial use of regression and corre- 
lation (Dunn 2007 p by focusing on the measurements obtained rather than the quantities 
being measured. The measurements obtained, however, are affected not only by differ- 
ences in the quantities being measured but also by the errors. Indeed, the differences have 
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Figure 6: Difference versus average of the hydrolysis and Gerber methods of measuring 
fat content with 95% limits of agreement. 



variance T^(/3 — 1)^ + {o^ + a|) under the model Q, showing explicitly the contribu- 
tions of each of these effects. Importantly, the differences and means have covariance 
1 /2 [t^(/3^ ~ 1) + (o'l ~ *^?)]' showing that the measurements may differ systematically 
even if /3 = 1 , and also that the measurements may agree even if /3 7^ 1 . 

In the present example, the downward trend of Figure |6] hints that /3 < 1, but another 
possibility is that 0\> 02- The Bland- Altman approach cannot distinguish between these 
possibilities without specific information on the total errors. The posterior density shown 
in Figure|7]isolates the relation between the quantities being measured, offering substantial 
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evidence that j8 < 1, that is, the increments of the quantity being measured by the hydrol- 
ysis method are smaller than those of the quantity being measured by the Gerber method. 
The shortest 95% posterior probability interval for j8 is (0-953, 0-991) with median 0-972. 




"n I ' I ' I ' I T" 

0.92 0.94 0.96 0.98 1.00 1.02 



Figure 7: Posterior density of the slope /3 for the fat content data with the median and 
shortest 95% probability interval. 
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5 Discussion 



Often, the information in a set of observations and knowledge of the sampling density that 
generated it suffice to identify the parameters of interest. This is not true in the bivariate 
normal errors in variables estimation problem with unspecified errors. The elements of the 
mean /i and covariance matrix E are identified, but the slope j8 is not. Given jU and E, the 



sampling density Q does not vary with /3. As explained by Poirier ( 1998), the data are 
conditionally uninformative for /3. Fortunately, the nonnegativity conditions (|9]) modify 
the domain of E in a way that depends on /3 , so the data are marginally informative for 
/3. That is, the data are able to revise prior beliefs, as the examples in Section |4] clearly 
demonstrate. 

|Reiers0l ( 1950) recognized that the nonnegativity conditions (|9]) restrict the values of 
j8 given E. If E12 > 0, then /3 is restricted to the interval E12/E11 < /3 < E22/E12. If 
E12 < 0, the inequalities are reversed. Of course, the true covariance E is not given, so this 
fact is not directly useful for inference. In large samples, however, these bounds should be 



approximated by the two OLS estimates of j8, an observation |Reiers0l| credited to [Frisch 
( |1934[ ), also emphasized by [Lindley and El-Sayyad| ( [T968[ ). 

Looking at things the other way around, the nonnegativity conditions ([9]) restrict the 
values of E given /3. If j8 > 0, then E12 is restricted to the interval < E12 < min(/3Eii, 
E22/J8). If /3 < 0, then E12 is restricted to the interval max(/3Eii,E22/j8) < E12 < 0. A 
conditional prior density p(E | /3) will therefore have a normalization factor that depends 
on /3, and it will contribute at least this much additional information. A conditional prior 
density p(E | /3) in the inverted Wishart W^^(^07 Vq) form with /3 dependent normaliza- 
tion factor ([To]) will contribute just this much information in the limit of vanishing degrees 
of freedom; all information that would otherwise be carried by ^0 and Vq is lost. Ap- 
pendix 2 shows that this limit is possible at the very end of the calculation for any proper 
prior density p(j8). 

The price to pay is that, no matter how large the sample, the prior information carried 
by the nonnegativity conditions (|9]) must persist in order to identify j8. From the perspec- 
tive of /3 given E, the OLS bounds emphasized by Lindley and El-Sayyad ( 1968, ) do not 
converge. The sample can never completely overwhelm the joint prior density, and the 
posterior density cannot concentrate on a single point. 

Of course, any information inadvertently incorporated into the prior density will im- 
pact the posterior inference on /3 as well, so prior specification cannot be taken lightly. 
It is important to consider what is not known as well as what is. The calculation in Sec- 
tion [3] assumed that prior information on /3 is indifferent to continuous rotations of the 
coordinate plane, consistent with the rotational symmetry of the sampling density (|7]). The 
calculation went forward by specifying the uniquely rotationally invariant Cauchy prior 
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density pT] ) to the s cale inva riant slope j6 . I n contrast, the seemingly benign uniform prior 
density specified by Zellner in Section 4.1 implied a prior density p{d) oc (sec0)^ on the 
angle 9 = arctan/3 that introduced a preference for lines approaching the vertical. Such a 
preference cannot be desired in all circumstances. 

Prior information that is not invariant under continuous rotations of the coordinate 
plane may still be incorporated in a way that leaves the posterior density invariant under 
interchange of coordinates. Interchange invariant prior densities on the scale invariant 
slope are necessarily of the form /?(/3) = 1//3 V^(/3, 1//3), where \ir{xi,X2) = \lf{x2,xi) is 
symmetric with respect to its arguments. This is a broad class of densities that includes 
the rotationally invariant Cauchy density and the posterior density ( [201 ). ^ convenient way 
to incorporate prior beliefs on j8 is therefore to use ( [201 ) ^ prior density, choosing the 
degree of freedom and correlation parameters v and r that reflect these beliefs. As ( l20l ) 
can be calculated for any v > 1 and —1 < r < 1, no prior data are required. 

Exact inference from the posterior density ([9\ i is possible only when the unknown true 
values and errors are normally distributed. Whether or not the true values and errors are 
normally distributed, the normal model has the virtue that the resulting inferences provided 
by the marginal posterior density depend on these distributions only through the first and 
second moments of the sampled values, assumed to be finite (see, for example, Jaynes[ 
20031, Chapter 7). 



Section |2l described the consequences of a model that assumes either one of the true 
values of the coordinates is sampled from an improper uniform distribution with infinite 
variance. The slope is identified, but the identified value depends on which coordinate is 
marginalized out, an unacceptable solution to a problem that demands invariance under 
interchange of coordinates. By incorporating the assumption that the distributions of true 
values and errors have finite mean and variance, the normal model leads to a solution with 
the required symmetries; the drawback is that the sampling density does not identify j8. 

Compounding the dilemma, lReiers0ll ( 1 1 9501 ) showed that the normal model is the only 
model that does not identify /3 . As Lindley and El-Sayyad ( 1968 1 point out, this constitutes 
an acute sensitivity to distribution. In any nonnormal model, the limiting posterior density 
will concentrate; otherwise it will not. They also point out, however, that more accurate 
estimation is unlikely without specific information about the true values and errors actually 
sampled in the data at hand. 

Finally, looking forward, it appears to be straightforward to generalize the normal 
model solution to higher dimensions, starting from the matrix form of the conditional prior 
density ( [TOl ). The challenge will be to find a practical expression for the normalization 
factor ( l22l ). The approach of jKlepper and Leamerl ( 11984| ) may be particularly helpful in 
this effort. 
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Appendices 

Appendix 1: The normalization factor 

The objective is to calculate the normalization factor 

^(/3,v,^) = y"|E|-(^+3)/2exp[-itr(^E-i)]dE (22) 



of the density ( [T0| ), where m and E are positive-definite, symmetric 2x2 matrices, and 
R is the region defined by the nonnegativity conditions (|9]). In ([22]) and throughout this 
appendix, m may depend on j8; for convenience such dependence is suppressed in the 
notation. But for the restriction to the integration region R, the defining integral in ( [22] ) 
could be calculated in much the same way as Fisher ( jlQlS) ) first calculated the 2x2 case 
of the Wishart density. As it is, R breaks the defining integral in ( 122] ) into two separate 
parts 

i^(/3,v,^) = 

poo POO P^JLll 

/ / / |Er(^+3)/2exp[-itr(vi'E-i)]dEiidE22dEi2 

Jo JpLn JO 

poo poo f^ll/l^ 

+ / |Er(^+3)/2exp[-itr(^E-i)]dE22dEiidEi2, 

Jo Jl22/P^Jo ^ 

for the case /3 > 0, and 

ii:(/3,v,^) = 



(23) 



poo poo pQ 

/ / / |Er(^+3)/2exp[-itr(^E-i)]dEiidE22dEi2 

Jo Jp^Lu Jpl.u 

poo poo pQ 

+ / |Er(^+3)/2exp[-itr(vi'E-i)]dE22dEiidEi2, 

Jo JzjnlB^Jzo.lB 



(24) 



for the case j8 < 0. 

Introducing the reparameterization (^11,^225^12) — ^ (I^I^^'jO of ^, where 

r = ^12/(^11^22)^/', 
/ = (^22/^11)^/', 

([23]) and ^ have the form 

i^(i8,v,^)=iS:i(|j8|//,v,|^|,rsign(/3))+i^i(//|i8|,v,|^|,rsign(/3)), (26) 



(25) 
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where 



i^l(j8,V,|^|,r) 



OO /"OO 



J^'^-I.n Jo 



J:|-(v+3)/2 



X 



exp [- i j^j^L^ (Si 1 - 2rEi2 + E22)] dEi 1 dE22 dEi2. 



(27) 



I apply the technique described by Anderson ( 1984b Section 4.2) to the integral (27 ), 



changing variables (E22,Ei2) — )■ (m,v) by 

M = Ei2/Eii, 

V = E22 — E12/E11. 

Changing variables (En, v) — )■ {s,w) in the resulting expression by 

w=ii?>p[l/Eii + 2,(i<)/v], 



(28) 
(29) 



(30) 
(31) 



where R\y = \^\ ^/^/ Vl — r^, and the nonnegative quadratic function Qr{u) = (u — r)^ + 
(1 — r^) leads to 



ifi(/3, V, M,r) = 2^r(^)r(^)V 



X 



I e.(.)-(^+i)%^^^^,)(^,^)d.. 



(32) 



In ( [321 ), Iz{ci,b) is the regularized incomplete beta function (beta cumulative distribution 
function) dOlver et al.i|20T0l Section 8.17), and 



z{u,^,r) 



Qr{u) 



Finally, introducing the integration variable t by 



puts ( |32| ) in the form 



where 



t/Vv = (M-r)/Vl -r2 
ifi(j8,V,|^|,r)=//(v)|^r^/2/(j8,V,r), 

//(v) = v^2^r(^)r(V), 



(33) 

(34) 

(35) 
(36) 
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and I{^,v,r) is the following integral over the Student t probability density function 
Pt{t;v) with degrees of freedom v and F cumulative distribution function Pf{F;Vi,V2) 
with degrees of freedom Vi and V2 ( Abramowitz and Stegun[ 1964| , Sections 26.6 and 
26.7). 

rt+{P,v,r) 

/(/3,v,r)= / p,(?;v)/V(F(?,/3,v,r);v + l,v-l)dr, (37) 



where the integration limits are 



?_(v,r) = -v^r/v 1 -r2 



r+(j8,V,r) = vV(/3-r)/Vl-r2 



and 



F(?J,v,r) 



v-1 



V + 1 [r+(/3,v,r)-f_(v,r)]2-[r-f_(v,r)]2 
Substituting ([35]) into ( |26l ), the final expression for A'(j8, v, ^) is 

iS:(/3,v,^)=//(v)|^r^/27(/3,v,r,/), 

where 

7(/3,v,r,Z)=/(|/3|//,v,rsign(/3))+/(Z/|/3|,v,rsign(j8)). 



(38) 
(39) 

(40) 

(41) 
(42) 



Appendix 2: The noninformative limit 

Due to the factor H{v) in ( |4T] ), the reduced sampling density (|6]) cannot be evaluated in 
the limit Vq — )■ 0. However, these factors cancel out of the posterior density ([12]), leaving 



p{P\y)=pm 

p(/3) 



|^o(/3) + v5|-(^"+^)/V(i8, Vq + V,^o(i8) + vS) 
|^o(/3)|-^«/27(/3,Vo,^o(/3)) 



X 



|^o(/3) + v5 




+^)/27(/3,Vo + v,^o(i8) + v5) 




^o(/3) 


-W27(/3,vo,^o(i8)) ^ 



(43) 



In ( [43] ), 7(/3 , V, ^) is shorthand for the function 7(/3 , v, r, Z) of ( [42] ) in the parameterization 



([25])of^. 

The precision matrix ^o(j8) of ( [TT] ) is parameterized by 

|VI'o(/3)| = v2K4/32(l-p2), 

ro(^) =posign(j8), 
/o(i8) = |/3|. 



(44) 
(45) 
(46) 
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The integration limits ( [38] ) and (|39j) of the first integral /( |/3 1 / /q, Vq, ro) of 7(j8 , Vq, ro, /o) in 
(|42l) are therefore 



?_(vo,ro(/3)) = -v/WPo/(l-Po')'^ 
f+(|/3|/Zo(/3), vo,ro(/3)) = v^(l -po)/(l -pI)"\ 



(47) 
(48) 



independent of /3 whatever the value of po, as is the function F(?,/3//o, Vq, ro), defined 
in ( |40l ). Consequently, /(|/3|//o, Vo,ro) is independent of /3. The same reasoning can be 
applied to the second integral /(/o/|/3|,Vo,ro), and therefore 7(/3, Vo,^o(j8)) cancels out of 
the posterior density ( |43| ). Furthermore, from ( |44| ), the factors involving the determinant 
|^o(j8)| in ( [43] ) are well-behaved in the limit 



limJ^o(/3)| 

Vo^O 



-V()/2 



lim[v2K4i82(l-po-)] 

Vo^O 



-Vo/2 



1, 



(49) 



while ^o(/3) 
is therefore 



in the same limit. The noninformative limit of the posterior density ( [431 ) 

p(/3)7(/3,V,r,/) 



lim p(/3 I y) 

Vo^O 



(50) 



ro.M/3W,v,r,/)d/3' 

where the sample correlation coefficient r = Su/ {Si\S22Y^^^ and the ratio of standard de- 
viations I = (5'22/'S'ii)^/^ are sufficient statistics. From the properties of the F cumulative 
distribution function, the integrals in 7(/3, v, r, /) are well-defined for n> 2. 



Appendix 3: Invariance properties 

ISamuelson] ( |1942[ ) proved that the geometric mean of the two OLS estimates is the only 
point estimate of the slope consistent with the following three properties: (1) it must de- 
pend on the data only through the sample correlation coefficient and ratio of standard de- 
viations; (2) it must be invariant to interchange of the coordinates; (3) it must be invariant 
to a scale change of either coordinate. 

Consider a posterior density p(/3 | y) = f{fi,r,l) exhibiting property 1. If this density 
must also exhibit properties 2 and 3, then 

/(/3,r,/)=/(l//3,r,l/Z)//32, (51) 

and 

/(j8,r,/) = c/(c/3,r,c/), (52) 
for any c > 0. Simultaneous solutions of pT) and ( [52] ) are of the form 



/(/3,r,/)=g(j8,l/i8,r)/(j8/), 



(53) 
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where the scale invariant slope /3 = /3//, and g is any function symmetric in its first two 
arguments. The posterior density ( [191 ) is a particular case of ( |53| ) with 

~ ~ i6p(i8)7(i8,v,r,l) 
^(P,l/P.n = 77^ . (54) 

where /z(r) = /!°^p(j8)7(j8, v,r, l)dj8, and it is easily verified from ([14]) and ( [IT] ) that 
Pp{P) and7(j8, v,r, 1) are each symmetric with respect to j8 and l/j8. 



Appendix 4: Special cases 

In special cases, the posterior density ([19]) is available in closed form. For instance, starting 



(55) 



from (32) with n = 4, v = n — 1 = 3, it is straightforward to show that 



i^i(j8,3,|^|,r) =23|vp|-3/2(i_^2^3/2_^ 



1 



The normalization factor ( 26 ) becomes 



if(j8,3,^) =23|VI'|-3/2(i_^2)3/2 



l+/32 /32 -2r/3 + l 



j82-2rj8 + r 



(56) 



where /3 = /3 // is the scale invariant slope parameter, using the reparameterization ( [25] ) of 
^. Applying the limit Vq — )■ to ( [T2] ) as described in Appendix 2, the posterior density 
I y) = p(j8 I >')//, where 



p{^\y)=K{r) 



(57) 



I+/32 j82_2r/3 + l' 

i = [Sii/Siiy^'^ is the ratio of standard deviations, r = Si2/(Sii522)^'^^ is the sample 



correlation coefficient, and 



if(r)=F(l,l;|;r2)-i 



>• Vi — ^2 



(58) 

arcsmr 

which is continuous at r = 0, with value ^(0) = 1. In ( |58] ), F{a,b;c;z) is the Gauss 
hypergeometric function (Olver et al. 2010, Section 15.2). 
Similarly, in the case n = 6, 

1 |j8|(j82-rj8 + l) 



p{^\y)=K{r) 



I+/32 (/32-2r/3 + l)2 



where 



K[r)^F{lX\;r^)-\ 



(59) 
(60) 
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